Customer Shopping Trend Analysis

Jingwen REN/Dan ZHAO/Yining KONG/Shanwen TANG

2024-03-04

Project Recap

The Customer Shopping Preferences Dataset reveal the purchasing patterns and preferences of our clients, and encompasses a broad spectrum of customer attributes.

we aim to achieve core objectives:

  • Data-Driven Product Optimization

  • Strategic Marketing Innovations

  • Enhancement of Customer Experience

Analytics Team

We are the Analytics Team in JD_KT Audit Company:

Jingwen REN

Dan ZHAO

Yining KONG

Shanwen TANG

Packages Prepare

library(tidyverse)
library(knitr)
library(gt)
library(readxl)
library(readr)
library(ggplot2)
library(dataMaid)
library(RSQLite)
library(tidytext) 
library(dplyr)
library(gridExtra)
library(lubridate)
library(janitor)
library(ggthemes)

Glimpse the data

shopping_trends_updated.csv

Feature: Number of observations( 3900 ) Number of variables( 18 )

shopping_trends_2 <- read_csv("shopping_trends_updated.csv")

shopping_trends.csv

Feature: Number of observations( 3900 ) Number of variables( 19 )

Data Cleaning

## Clean the col names
shopping_trends_1 <-shopping_trends_1 %>%
  clean_names()%>%
  setNames(tools::toTitleCase(names(.)))

shopping_trends_2 <-shopping_trends_2 %>%
  clean_names()%>%
  setNames(tools::toTitleCase(names(.)))

## Check the results
shopping_trends_1 %>%
  head(1)

shopping_trends_2%>%
  head(1)

Data Cleaning

## Use SQLite function to join two tables
con <- dbConnect(RSQLite::SQLite(), dbname = "your_database.db")
dbWriteTable(con, "shopping_trends_2_table", shopping_trends_2,overwrite = TRUE)
dbWriteTable(con, "shopping_trends_1_table", shopping_trends_1,overwrite = TRUE)

query <- "
SELECT 
  shopping_trends_2_table.*,
  shopping_trends_1_table.Preferred_payment_method
FROM 
  shopping_trends_2_table
LEFT JOIN 
  shopping_trends_1_table 
ON 
  shopping_trends_2_table.Customer_id = shopping_trends_1_table.Customer_id
"
result1 <- dbGetQuery(con, query)

apply(result1, 2, max, na.rm = TRUE)
             Customer_id                      Age                   Gender 
                  "3900"                     "70"                   "Male" 
          Item_purchased                 Category      Purchase_amount_usd 
               "T-shirt"              "Outerwear"                    "100" 
                Location                     Size                    Color 
               "Wyoming"                     "XL"                 "Yellow" 
                  Season            Review_rating      Subscription_status 
                "Winter"                    "5.0"                    "Yes" 
           Shipping_type         Discount_applied          Promo_code_used 
          "Store Pickup"                    "Yes"                    "Yes" 
      Previous_purchases           Payment_method   Frequency_of_purchases 
                    "50"                  "Venmo"                 "Weekly" 
Preferred_payment_method 
                 "Venmo" 

Data Summary

shopping_trends <- as.data.frame(result1)

summary(shopping_trends)
  Customer_id          Age           Gender          Item_purchased    
 Min.   :   1.0   Min.   :18.00   Length:3900        Length:3900       
 1st Qu.: 975.8   1st Qu.:31.00   Class :character   Class :character  
 Median :1950.5   Median :44.00   Mode  :character   Mode  :character  
 Mean   :1950.5   Mean   :44.07                                        
 3rd Qu.:2925.2   3rd Qu.:57.00                                        
 Max.   :3900.0   Max.   :70.00                                        
   Category         Purchase_amount_usd   Location             Size          
 Length:3900        Min.   : 20.00      Length:3900        Length:3900       
 Class :character   1st Qu.: 39.00      Class :character   Class :character  
 Mode  :character   Median : 60.00      Mode  :character   Mode  :character  
                    Mean   : 59.76                                           
                    3rd Qu.: 81.00                                           
                    Max.   :100.00                                           
    Color              Season          Review_rating  Subscription_status
 Length:3900        Length:3900        Min.   :2.50   Length:3900        
 Class :character   Class :character   1st Qu.:3.10   Class :character   
 Mode  :character   Mode  :character   Median :3.70   Mode  :character   
                                       Mean   :3.75                      
                                       3rd Qu.:4.40                      
                                       Max.   :5.00                      
 Shipping_type      Discount_applied   Promo_code_used    Previous_purchases
 Length:3900        Length:3900        Length:3900        Min.   : 1.00     
 Class :character   Class :character   Class :character   1st Qu.:13.00     
 Mode  :character   Mode  :character   Mode  :character   Median :25.00     
                                                          Mean   :25.35     
                                                          3rd Qu.:38.00     
                                                          Max.   :50.00     
 Payment_method     Frequency_of_purchases Preferred_payment_method
 Length:3900        Length:3900            Length:3900             
 Class :character   Class :character       Class :character        
 Mode  :character   Mode  :character       Mode  :character        
                                                                   
                                                                   
                                                                   

Analysis 1

Group customers according to age and the proportion of male and female in each group

Age_group <- function(age) {
  if (age >= 18 & age <= 35) {
    return("Youth")
  } else if (age > 35 & age <= 60) {
    return("Middle_age")
  } else {
    return("Elderly")
  }
}


df <- data.frame(Age = shopping_trends$Age, Gender = shopping_trends$Gender)
df$Age_group <- sapply(df$Age, Age_group)
summary_df <- df %>%
  dplyr::group_by(Age_group, Gender) %>%
  dplyr::summarize(count = n(), .groups = "drop") %>%
  dplyr::group_by(Age_group) %>%
  dplyr::mutate(total_count = sum(count),
         percentage = paste0(round(count / total_count * 100, 2), "%")) %>%
  dplyr::ungroup() %>%
  dplyr::arrange(Age_group, desc(Gender)) %>%
  dplyr::select(-total_count)

Analysis results

summary_df %>%
  gt()
Age_group Gender count percentage
Elderly Male 495 68.46%
Elderly Female 228 31.54%
Middle_age Male 1259 67.54%
Middle_age Female 605 32.46%
Youth Male 898 68.39%
Youth Female 415 31.61%

Result

The Middle age Group (35-60) has the larest number(including male and female).

The proportion of male is larger than female in all groups.

Analysis 2

The most popular category, and the three largest items proportion in this category

# 2.1The most popular category
 Category_group <- shopping_trends %>%
  group_by(Category) %>%
  summarise(count = n()) %>%
  arrange(desc(count))
pie(Category_group$count, 
    labels = Category_group$Category, main = "Pie Chart")
## The most popular category is Clothing

Analysis 2

# 2.2 The three largest items proportion in Clothing
 items_category <- dplyr:::select(shopping_trends,Item_purchased, Category)
 items_category_1 <-items_category %>%
   dplyr:::filter(Category == "Clothing")%>%
   group_by(Item_purchased)%>%
   summarise(count = n())%>%
   slice_max(order_by=Item_purchased,
            n=3)
items_category_1 %>%
  gt()
Item_purchased count
T-shirt 147
Sweater 164
Socks 159

Result

T-shirt, Sweater, Socks are the largest items in the largest category.

Analysis 3

The Season correlate with the purchase amount

shopping_trends$Season <- factor(shopping_trends$Season, levels = c("Spring", "Summer", "Fall", "Winter"))
ggplot(shopping_trends) +
 aes(x = Season, y = Purchase_amount_usd) +
 geom_boxplot(fill = "#112446") +
 labs(x = "Season", y = "Purchase amount USD", title = "Season vs. Purchase Amount ") +
theme(axis.text.x = element_text(size = 15))+
 theme_minimal()
## The season of Fall is the highest in Purchase amount 

Analysis 4

The most common shipping type for customers with a review rating above 4

common_type <- shopping_trends %>%
  filter(Review_rating > 4) %>%
  group_by(Shipping_type) %>%
  summarise(n = n()) %>%
  arrange(desc(n)) %>%
  slice(1)
common_type %>%
  gt()
Shipping_type n
Standard 272

Result

The most common shipping type for customers with a review rating above 4 is Standard.

Analysis 5

Best Color sold in each Season

season_color_sales <- shopping_trends %>%
  group_by(Season, Color) %>%
  summarise(Count = n()) 
top_colors <- season_color_sales %>%
  group_by(Season) %>%
  top_n(1, Count)
top_colors <- top_colors[match(c("Spring", "Summer", "Fall", "Winter"),
                               top_colors$Season), ]
top_colors %>%
  ungroup() %>%
  gt()
Season Color Count
Spring Olive 52
Summer Silver 59
Fall Magenta 50
Winter Green 50

Analysis 6

The average purchase amount for customers who have made more than 30 previous purchases

avg_purchase <- mean(
  shopping_trends$Purchase_amount_usd[
    shopping_trends$Previous_purchases > 30
    ],
  na.rm = TRUE
  ) %>%
  round(digits = 2)
print(avg_purchase)
[1] 60.03

Result

The average purchase amount is 60.03.

Analysis 7

The average purchase amount for customers who have a subscription and used Venmo as the payment method

# Filter for customers with a subscription and who used Venmo
filtered_data <- shopping_trends %>%
  filter(Subscription_status == "Yes", Payment_method == "Venmo")

# Calculate the average purchase amount
average_purchase_amount <- filtered_data %>%
  summarise(AverageAmount = mean(Purchase_amount_usd))

#print(average_purchase_amount)

Result

The average purchase amount for this customer group is 57.5 USD.

Analysis 8

Locations with stronger purchase power and more subscribed members

cat('<div style="max-height: 350px; overflow-y: auto;">')
## Filter the subscribed members
subscribed_members <- shopping_trends %>%
  filter(Subscription_status == "Yes")

## TOP6 Locations with stronger purchase power
top6_purchase_power <- subscribed_members %>%
  group_by(Location) %>%
  summarise(Total_Purchase_Amount = sum(Purchase_amount_usd)) %>%
  top_n(6, Total_Purchase_Amount)



## TOP6 Locations with most Subscription members
top6_members_count <- subscribed_members %>%
  group_by(Location) %>%
  summarise(Members_Count = n_distinct(Customer_id)) %>%
  top_n(6, Members_Count)



## bar charts
top_purchases_plot <- 
  ggplot(top6_purchase_power,
         aes(x = reorder(Location, -Total_Purchase_Amount), 
             y = Total_Purchase_Amount, fill = Location)) + 
  geom_bar(stat = "identity") + 
  geom_text(aes(label = sprintf("%.2f", Total_Purchase_Amount)), 
            stat = "identity", 
            position = position_stack(vjust = 0.5), 
            colour = "white", size = 2) +
  scale_fill_brewer(palette = "Set2", direction = 1) +
  ggthemes::theme_fivethirtyeight() +
  theme(text = element_text(size = 9),
        plot.title = element_text(size = 9),
        axis.text.x = element_text(angle = 45, hjust = 1),
        legend.key.size = unit(0.3, "cm"),
        legend.title = element_blank()) +
  labs(title = "Top 6 Locations by Total Purchase(USD)", x = "Location", y = "Total Purchase Amount (USD)")

top_members_plot <- 
  ggplot(top6_members_count, 
         aes(x = reorder(Location, -Members_Count), 
             y = Members_Count, fill = Location)) +
  geom_bar(stat = "identity") +
  geom_text(aes(label = sprintf("%0.0f", Members_Count)), 
            stat = "identity", 
            position = position_stack(vjust = 0.5), 
            colour = "white", size = 2) +
  scale_fill_brewer(palette = "Set2", direction = 1) +
  ggthemes::theme_fivethirtyeight() +
  theme(text = element_text(size = 9),
        plot.title = element_text(size = 9),
        axis.text.x = element_text(angle = 45, hjust = 1),
        legend.key.size = unit(0.3, "cm"),
        legend.title = element_blank()) +
  labs(title = "Top 6 Locations by Members", x = "Location", y = "Members Count")
cat('</div>')

Missouri ranks among the top in terms of member registrations, yet their purchasing power is not very strong. Oklahoma has relatively strong purchasing power (ranking 5th among the 50 states), but they do not have many registered members.

grid.arrange(top_purchases_plot, top_members_plot, ncol = 2)

Recommendations

For Missouri:
  • Increase exclusive member discounts: To enhance purchasing power, offer exclusive discounts and double points accumulation for members, encouraging consumption.

  • Enhance user engagement: By hosting exclusive events for members, increase their sense of participation and brand loyalty, thereby enhancing their willingness to purchase.

For Oklahoma:
  • Referral reward program: Implement a referral reward program to encourage existing members to refer new users to register as members, attracting more users through word-of-mouth.

  • Coupons and trials: Provide coupons or free trials for new registered users to lower their initial purchase threshold.

Recommendations

  • For customers of different age groups, develop marketing campaigns that specifically target the Middle Age demographic, focusing on their interests and values. Given that the elderly and youth have higher counts, implement loyalty programs.
  • Identify popular categories and products, target marketing efforts towards trending products, keep a close eye on inventory turnover rates, especially the stock levels of Sweaters, T-Shirts, and Socks to meet the high demand in the Clothing category.
  • Based on the impact of product ratings on sales, optimize low-rated products, communicate customer feedback to suppliers to improve product quality, and enhance after-sales and logistics services.
  • Optimize discount and promotional strategies, increase advertising for discounts/membership registration to enhance product exposure.

Project Division

Here is our work division among four group members

  • REN Jingwen: Intro + Data Cleaning
  • ZHAO Dan: Q1 + Q2 + Q3
  • KONG Yining: Q4 + Q5 + Q6
  • TANG Shanwen: Q7 + Q8 + Recommendation

References

  • https://www.kaggle.com/datasets/iamsouravbanerjee/customer-shopping-trends-dataset/data

  • https://quarto.org/docs/presentations/revealjs/

  • https://bookdown.org/yihui/rmarkdown/markdown-syntax.html